Entropy of continuous mixtures and the measure problem 
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In its continuous version, the entropy functional measuring the information content of a given 
probabihty density may be plagued by a "measure" problem that results from improper weighting 
of phase space. This issue is addressed considering a generic collision process whereby a large 
number of particles/agents randomly and repeatedly interact in pairs, with prescribed conservation 
law(s). We find a sufficient condition under which the stationary single particle distribution function 
maximizes an entropy-like functional, that is free of the measure problem. This condition amounts 
to a factorization property of the Jacobian associated to the binary collision law, from which the 
proper weighting of phase space directly follows. 

PACS numbers: 05.20.Dd, 05.20.-y, 02.50.Ng 



In information theory, the definition of the entropy of a 
continuous probability distribution depends on the iden- 
tification of a relevant prior, or weighting function [H, 01, 
that can prove elusive. To illustrate this point and moti- 
vate our approach, let us consider an ensemble of parti- 
cles (each indexed by integer i) that can exchange some 
positive quantity x so that Xi is fixed: two particles i 
and j chosen at random interact so that Xi ^ Xi + rj and 
Xj — Xj — rj, provided both quantities remain positive. 
Here, 77 is a fixed small increment, or can be drawn from 
a prescribed distribution. Such a model has appeared in 
different settings: In the context of mass transport mod- 
els, X stands for the mass of the particle W[; it can also be 
the position of a composite object in exclusion processes 
the volume of some colloidal aggregate [5], the size 
of a self- assembled polymer [6], the wealth of an agent in 
a simplistic econophysics framework Q, or an auxiliary 
quantity used for algorithmic purposes, in particular the 
generation of pseudo-random numbers [§1 . Upon iterat- 
ing the previous "collision" rule, it can be shown that the 
x-distribution reaches the simple stationary probability 
density function p^{x) = exp(— x), fixing for convenience 
the mean x to unity Q. Following the early work of 
C. Shannon this result seems to be readily recovered 
by maximizing the information measure -or differential 
entropy- of the distribution 



Ssha 



= - Px{x)^og[p^{x)]dx 



(1) 



under the constraint that J p^dx = J x p^dx = 1 [lo| . 
On the other hand, it is clear that the process could be 
equally well described by another quantity y (say, the ra- 
dius of a colloid instead of its volume) , with a correspond- 
ing probability density p^ such that p^(x)dx — Py{y)dy. 
However, the formulation ([T]) is not invariant under 
change of variable, so that a different and inconsistent 
distribution would be found by maximizing — J p^logPy, 
even after taking into account for constraints appropri- 



ately. We will refer to this latent deficiency, already noted 
in [21], as the "measure problem". In addition, although 
the pathological nature of Eq. ^ is made evident by a 
change of variable, it can also be inferred from its dimen- 
sional inconsistency. We conclude that recovering the 
correct (in our example exponential) distribution from 
maximizing (jlj is coincidental, and that Eq. ([Ij does 
not provide an admissible information measure. We em- 
phasize that Shannon faced the measure problem 
and concluded that the entropy of a continuous distri- 
bution is not an absolute measure, but is relative to the 
coordinate system. Such a point of view is not accept- 
able: the entropy should not have an absolute status for 
discrete probabilities, and a relative one for continuous 
cases. 

The mechanism for recovering an absolute information 
measure that is unaffected by a parameter change, is clear 
when the continuous limit is carefully taken from the 
situation described by a discrete probability set {pa}-, 
where the entropy reads —J2aP<^^^&Pa S 
doing so, it is necessary to introduce the density of points 
m^{x) and one obtains [H 



S = - Pxix)^og[A^{x)p^{x)] dx. 



(2) 



In the above expression, that seems to have been first 
derived and commented by Jaynes the quantity 

can be viewed as a weighting function, and it is related 
to the density by A^(a;) = l/mj^x). This x-dependent 
function indicates how the space of dynamical variables 
is resolved [2|: The larger the density m^, the better the 
resolution, which corresponds to a smaller A^. Since the 
densities m transform under change of variable as the 
probability densities p do, the coordinate dependence of 
A cures the measure problem. It is therefore essential 
to understand what this dependence is, a problem that 
is quite often overlooked in the literature 0, |^ and 
that Jaynes -somewhat ironically- ascribes to the fact 
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that "one could not think of anything else to do" [ISj . 
Consequently, if the density can be extracted from our 
knowledge of x-space sampling, the measure problem is 
solved. This case is that of a "quenched" x-distribution. 
There are nevertheless situations where this knowledge 
is not a priori available, but is encoded in the dynamics 
of the system ("annealed" x-distribution), so that is 
selected by the underlying dynamical rules. Our goal here 
is to understand that connection in an annealed context, 
in order to set up a clear prescription for writing the 
relevant entropy. 

We are now in a position to state the problem more 
precisely. We are interested in a population of a large 
number TV of particles where a given property Xi (mass, 
velocity, length, color, income etc) is attached to each 
particle i. These particles undergo repeatedly binary 
"collisions" where pairs selected at random interact such 
that {xi ,Xj) — > (x^ ,x'j). An important point is that we 
assume the existence of a conservation law 



C(x.)+C(x,)-C(.tO+C(x'), 



(3) 



where C is a given function. We shall leave er- 
godicity issues aside, and consider that the functions 
x[{xi,Xj),x'j{xi,Xj), that are not specified, are suffi- 
ciently mixing to ensure that all accessible phase space 
is sampled (in general, non uniformly). The objective is 
to answer the following question. Q: Can we maximize 
a functional of the form ([2]), under the appropriate con- 
straints that J p^{x)dx and / C{x)p-^{x)dx are fixed, to 
obtain the steady state probability distribution p^{x), 
if it exists? If so, simple calculus shows that the latter 
distribution is of the form 



p'Hx) ^aK '^{x) exp[ 



-PC{x)] 



(4) 



where a and /3 are irrelevant Lagrange multipliers. The 
ensuing problem is then to understand what specifies the 
weighting function A^. Indeed, knowing that Q can be 
answered affirmatively is of little interest if one does not 
know the corresponding weighting function A^(x). 

The collision law considered may violate detailed bal- 
ance, and it may involve an additional stochastic param- 
eter T], as for instance in the simple example introduced 
in 0, that we mention as a warm-up exercise: 



Tj {xi + X2)/a/2 

rj {xi - X2)/V2 



(5) 



where 77 equiprobably takes values ±1. Likewise, ran- 
domness is necessarily introduced for colliding hard bod- 
ies, as a remnant of the impact parameter in a descrip- 
tion that only considers the velocity degrees of freedom, 
as routinely done in some Monte Carlo simulation tech- 
niques It should be clear from the outset that the 
conserved quantity is in general not exponentially dis- 
tributed, as our simplistic introductory example might 



lead to believe. Indeed, considering Eq. ([5]) that con- 
serves "energy" e = x^ , i.e. C{x) = x^ , ii appears that 
p^*(e) (X exp(— /3e)/v^ in the steady state [17|, where 
/3 is some inverse temperature. A naive application of 
Eq. ([T]), on the other hand, leads to the incorrect result 
p^{e) (X exp(— /3e). This means that here, Asie) oc ^/e, 
and we learn on this simple example that the conserva- 
tion law is not sufficient, in general, to obtain the relevant 
A. This key quantity is encoded in the transformation 
law (xi,Xj) — ?> {x[,Xj), in a way that we now bring to 
the fore. 

To get an idea of the connection (A o dynamics) , we 
first restrict to the subclass of processes that fulfill de- 
tailed balance. The corresponding single-particle distri- 
bution obeys then 

p'J{xi)p'^'{x2) dxidX2 = P^*(x'i)p;*(4) dx'id^, (6) 

where, due to the mean-field-like sampling procedure 
with randomly chosen pairs, the two-particle probabil- 
ity distribution factorizes for large into a product of 
single particle distributions (a more technical proof will 
be outlined below). On the other hand, assuming that 
Q can be answered positively, the stationary single par- 
ticle distribution p^ is constrained to be of the form ^ . 
Then, from eqs. (jlj, (|6]) and the conservation law, eq. 
([3|), we find that the Jacobian J of the transformation 
{xi,X2) to {x[,x'2), admits a factorized form 



^^(2:1, 2:2) 



det- 



d{x[,x'2) _ A,(x'i)A,(4) 



d{xi,X2) Ax(a;i)A^(a;2) 



(7) 



We emphasize here that the Jacobian is defined for a 
given value of the stochasticity parameter 77: x'l and X2 
are functions of Xi, X2, and rj. 

We now arrive at our main part and we will show 
below that if the factorization property ^ oi J holds 
(without any other restriction as for example detailed 
balance) then the stationary distribution function p^ 
is of the form (jl]), and hence we are able to answer 
affirmatively to question Q. In addition, the relevant 
weighting function A^ can then be directly read from 
([7]). This is interesting from an operational point of 
view, since the Jacobian directly follows from the knowl- 
edge of the collision law, which is an input of the model. 
As a illustration, we return to the toy model of Eq. 
(ini), recast in the conserved variable e = x"^ . We have 
^^(61,62) = [6^62/(6162)]^/^, which is of the form (O, 
with Ag (e) oc ^/e. This immediately leads to the correct 
distribution (e) oc eyi\){—(ie)/y/e. 

We now proceed with our general proof, that starts 
with assuming property ^ for , and that involves the 
following three steps. 

a) We introduce a new set of variables, under the mild 
assumption that A^ in Q is non vanishing. Indeed, with 
the function of x z{x) — J dx' / A^{x'), the Jacobian of 
the collision law becomes unity {dz[dz'2 = dzidz2), which 
simplifies the kinetic theory description. 
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b) Although our aim is to derive the stationary sin- 
gle particle distribution function p^, working at TV- 
body level with the phase space density pAr(r,t), where 
r = (zi, . . . , zjv), turns out to be a convenient detour. 
This distribution obeys the following evolution equation 
110 



dtPNiT.t) 



N 

E 

i<j 



drjw[ri) 



PNiT,t) (8) 



where the random variable 77 with distribution w enters 
the collision law (see above), that can be described by 
the inverse collision operator 6 -^^ . This operator acts on 
the distribution on its right by replacing the arguments 
Zi and Zj by their precoUisional values z* and z*: 



b'i^l^PN{T, t) ^ pN{zl,Z^,Z3,...,ZN,t) 



(9) 



with {z* j)' — Zij. The present description in terms of z 
quantities is also endowed with a conservation law, that 
we write here -modulo a slight abuse of notation- with 
the same function C as in Eq. ([3]): J^i^i^i) — ^- 1^ 
then straightforward to see that the distribution cx 
5{C — '^iC{zi)) (with proper normalization) provides a 
stationary solution to Eq. The corresponding single 
particle distribution function follows from computing the 
first marginal p^*(zi) oc J p%dz2 ■ ■ ■ dz^. The argument 
is akin to that put forward to construct the canonical 
ensemble from the micro-canonical distribution ^20,] , and 
leads to p|*(z) oc exp[— /3C(z)]. 

c) The last important step in the argument is to show 
that the A^-body measure is attractive, at long times, 
for arbitrary initial conditions sharing the same value of 
C. For this purpose, we borrow a technique introduced 
in (2H and consider an arbitrary strictly convex positive 
function h{x) from which we construct 



H{t)^ / drp%{r)h[pN{r,t)]. 



dTdr^p'r^iT)w{ii) 



(10) 



The evolution equation ([8|) implies 

dH _ N{N- 1) 
'dt ~ 2 



{h'[pN{T,t)] [pN{b[fr,t)~pN{T,t) 
+h[pN{T,t)] ~ h[pN{b[l^T,t)]} 



(11) 



where we have used the invariance under permutation of 
particle indices and that 

J dTdr^p'j^{r)w{T^) {h[pN{r,t)] - h[pN{b[''^T,t)]} = 0. 

(12) 

From the convexity of h, we have that H is a non- 
increasing function of time. Since H is in addition 
bounded from below, we have established that it con- 
verges at long times to a constant 22|. Moreover, as the 



curly bracket of eq. (ITll) only vanishes when pAr(r,t) = 
PA'(^i2^ri i), we conclude from our ergodicity assumption 
that all initial phase space densities for which the con- 
served quantity is strictly equal to C evolve towards p^. 
This is a flat and finite measure on the ensemble de- 
fined by ~ ^1 ^^'^ '^^^ be seen as a generalized 
micro-canonical density. A similar property also applies 
to the first marginal p^{z,t) that is then attracted to 
pj* oc exp[— /3C(z)]. Returning to the original variable 
X in which the problem was formulated, and bearing in 
mind that dz/dx = A~^(a::), this yields the desired result 
that the stationary distribution p^{x) is of the form (jH). 
Incidentally, we also obtain here that the 2-body distri- 
bution P2x{^It^2) factorizes in the steady state, in the 
product pj*(a;i)pj*(x2), as mentioned above. 

So far, we have shown that under the assumption ([7|. 
the steady-state distribution p^ can be found by mini- 
mizing a functional of the form ^ , with a known weight- 
ing function Ax, directly read from ([7]). This was illus- 
trated by the toy dynamics ([5]), but our introductory ex- 
ample also may be understood in that framework: with 
the dynamics defined by (xi,Xj) — > (xi + 77, a;^ — 77), for 
which C{x) — X, we simply have J^{xi,X2) = 1, hence 
A^ = 1 and p^*(x) oc exp(— aa;). 

To complete the analysis, three remarks are in order. 
First, while we have restricted to the scalar case for the 
sake of simplicity, x can equally be a vectorial quantity. 
Second, Eq. ([7]), when it applies, does not define a unique 
function A^. Indeed, consider two candidates obeying 



A^{x[)A^{x'2) ^ Ax(a^i)A^(4) 
A^(a;i)A^(a;2) A^(a;i)A^(a;2) 



(13) 



for all xi, X2, and r/. Then, log(A^/A^) is a collisional 
invariant. Assuming that there is no "hidden" conserva- 
tion law, we have log(Ax(a;)/A^(a;)) = a + bC{x), where a 
and b are arbitrary constants. So, a candidate weighting 
function defined through ([7]), is prescribed up to a func- 
tion exp(a -|- bC{x)). Such a freedom in the choice of A 
only shifts the functional © by the constant a + b{C). 
The final result for p^ is hence not affected by the choice 
made for A^. 

Third, it seems worthwhile to provide a more in- 
tuitive understanding of the fact that if the Jacobian 
(xi,X2) — > {x'i,x'2) fulfiUs Eq. ([7]), then the correspond- 
ing weighting function is proportional to the A^ appear- 
ing in ([7]). For any pair {xi,X2), Q implies that the 
respective uncertainties 5xi and 8x2, will be aff'ected 
by the collision such that h-x[xi)^^ 5xiK^{x2)~^ 6x2 — 
h.^{x'i)~^ 8x'-^h.^{x'2)~^ 8x'2- At long times, the system 
reaches a state where the solution to the above con- 
straint is simply 5x/h.^{x) = est, so that at one body 
level, the space of dynamical variables is resolved with an 
x-dependent precision 5x oc h.-^{x). This allows to view 
A^ as an x-dependent volume in the space of dynami- 
cal variable, that quantifies the "graining" with which 
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the space is resolved. Alternatively, this argument shows 
that the density of points generated in x-space verifies 
m^(x) oc l/5x<x 1/A^(a;), as already mentioned. 

To summarize, we have studied a class of problems en- 
countered in different contexts, such as soft matter where 
a mixture of polydisperse hard spheres 24 1 , hard 

rods 25 1, or ring polymers Q have been shown to ex- 
hibit a condensation in real space, stochastic mass trans- 
port mode ls H , or in mathematical literature where the 
Kac walk [21| is an important kinetic theory toy model 
for studying the propagation of chaos and rate of equi- 



libration [26|. Specifically, our goal here was to ana- 



lyze under which conditions the steady state distribu- 
tion obtained by iterating a generic collision process 
with conservation law [Eq. ([3])] could equivalently be ob- 
tained from a maximum entropy argument by extremal- 
izing a given functional of the type ([2]) . We have found 
that this is the case if the Jacobian of the collision law 
{xi,X2) — >■ {x'i,x'2), can be written as in Q, from which 
the relevant weighting function h.^{x) can be extracted, 
which provides simply p^* . This is for example the case of 
Refs. 
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- 25|. The connection thereby established 
is free of the so-called measure problem, that plagues a 
naive writing of the entropy functional as in ([1]) , an ex- 
pression first proposed by Shannon, and that propagated 
in a significant fraction of the literature. Our analysis, in 
other words, provides the correct prior that should 
be considered, see Eq. ([2]). A key point is that the config- 
urations allowed by the conservation law(s) are in general 
sampled non uniformly. This non-uniformity, encoded in 
the X dependence of A^, that gives different weights to 
different points in x-space, is the feature ensuring that 
the information measure considered is absolute, and does 
not depend on the parameterization chosen. We finally 
note that our approach -which includes multiple conser- 
vation laws- can be generalized to more complex colli- 
sional processes, involving more than two bodies, or in 
which the collision frequency uj{xi,X2), chosen constant 
here for the sake of simplicity, actually depends on the 
pair considered, as long as uj(xi,X2) = uj(x'i,x'2). 
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